Search | WHO COVID-19 Research Database

Intrahost SARS-CoV-2 k-mer Identification Method (iSKIM) for Rapid Detection of Mutations of Concern Reveals Emergence of Global Mutation Patterns.

Thommana, Ashley; Shakya, Migun; Gandhi, Jaykumar; Fung, Christian K; Chain, Patrick S G; Maljkovic Berry, Irina; Conte, Matthew A.

Viruses ; 14(10)2022 09 27.

Article in English | MEDLINE | ID: covidwho-2066541

ABSTRACT

Despite unprecedented global sequencing and surveillance of SARS-CoV-2, timely identification of the emergence and spread of novel variants of concern (VoCs) remains a challenge. Several million raw genome sequencing runs are now publicly available. We sought to survey these datasets for intrahost variation to study emerging mutations of concern. We developed iSKIM ("intrahost SARS-CoV-2 k-mer identification method") to relatively quickly and efficiently screen the many SARS-CoV-2 datasets to identify intrahost mutations belonging to lineages of concern. Certain mutations surged in frequency as intrahost minor variants just prior to, or while lineages of concern arose. The Spike N501Y change common to several VoCs was found as a minor variant in 834 samples as early as October 2020. This coincides with the timing of the first detected samples with this mutation in the Alpha/B.1.1.7 and Beta/B.1.351 lineages. Using iSKIM, we also found that Spike L452R was detected as an intrahost minor variant as early as September 2020, prior to the observed rise of the Epsilon/B.1.429/B.1.427 lineages in late 2020. iSKIM rapidly screens for mutations of interest in raw data, prior to genome assembly, and can be used to detect increases in intrahost variants, potentially providing an early indication of novel variant spread.

Subject(s)

COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , COVID-19/diagnosis , COVID-19/epidemiology , Mutation , Spike Glycoprotein, Coronavirus/genetics

Benchmark datasets for SARS-CoV-2 surveillance bioinformatics.

Xiaoli, Lingzi; Hagey, Jill V; Park, Daniel J; Gulvik, Christopher A; Young, Erin L; Alikhan, Nabil-Fareed; Lawsin, Adrian; Hassell, Norman; Knipe, Kristen; Oakeson, Kelly F; Retchless, Adam C; Shakya, Migun; Lo, Chien-Chi; Chain, Patrick; Page, Andrew J; Metcalf, Benjamin J; Su, Michelle; Rowell, Jessica; Vidyaprakash, Eshaw; Paden, Clinton R; Huang, Andrew D; Roellig, Dawn; Patel, Ketan; Winglee, Kathryn; Weigand, Michael R; Katz, Lee S.

PeerJ ; 10: e13821, 2022.

Article in English | MEDLINE | ID: covidwho-2010486

ABSTRACT

Background: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the cause of coronavirus disease 2019 (COVID-19), has spread globally and is being surveilled with an international genome sequencing effort. Surveillance consists of sample acquisition, library preparation, and whole genome sequencing. This has necessitated a classification scheme detailing Variants of Concern (VOC) and Variants of Interest (VOI), and the rapid expansion of bioinformatics tools for sequence analysis. These bioinformatic tools are means for major actionable results: maintaining quality assurance and checks, defining population structure, performing genomic epidemiology, and inferring lineage to allow reliable and actionable identification and classification. Additionally, the pandemic has required public health laboratories to reach high throughput proficiency in sequencing library preparation and downstream data analysis rapidly. However, both processes can be limited by a lack of a standardized sequence dataset. Methods: We identified six SARS-CoV-2 sequence datasets from recent publications, public databases and internal resources. In addition, we created a method to mine public databases to identify representative genomes for these datasets. Using this novel method, we identified several genomes as either VOI/VOC representatives or non-VOI/VOC representatives. To describe each dataset, we utilized a previously published datasets format, which describes accession information and whole dataset information. Additionally, a script from the same publication has been enhanced to download and verify all data from this study. Results: The benchmark datasets focus on the two most widely used sequencing platforms: long read sequencing data from the Oxford Nanopore Technologies platform and short read sequencing data from the Illumina platform. There are six datasets: three were derived from recent publications; two were derived from data mining public databases to answer common questions not covered by published datasets; one unique dataset representing common sequence failures was obtained by rigorously scrutinizing data that did not pass quality checks. The dataset summary table, data mining script and quality control (QC) values for all sequence data are publicly available on GitHub: https://github.com/CDCgov/datasets-sars-cov-2. Discussion: The datasets presented here were generated to help public health laboratories build sequencing and bioinformatics capacity, benchmark different workflows and pipelines, and calibrate QC thresholds to ensure sequencing quality. Together, improvements in these areas support accurate and timely outbreak investigation and surveillance, providing actionable data for pandemic management. Furthermore, these publicly available and standardized benchmark data will facilitate the development and adjudication of new pipelines.

EDGE COVID-19: A Web Platform to generate submission-ready genomes from SARS-CoV-2 sequencing efforts.

Lo, Chien-Chi; Shakya, Migun; Connor, Ryan; Davenport, Karen; Flynn, Mark; Myers Y Gutiérrez, Adán; Hu, Bin; Li, Po-E; Player Jackson, Elais; Xu, Yan; Chain, Patrick S G.

Bioinformatics ; 2022 Mar 24.

Article in English | MEDLINE | ID: covidwho-1758637

ABSTRACT

SUMMARY: Genomics has become an essential technology for surveilling emerging infectious disease outbreaks. A range of technologies and strategies for pathogen genome enrichment and sequencing are being used by laboratories worldwide, together with different, and sometimes ad hoc, analytical procedures for generating genome sequences. A fully integrated analytical process for raw sequence to consensus genome determination, suited to outbreaks such as the ongoing COVID-19 pandemic, is critical to provide a solid genomic basis for epidemiological analyses and well-informed decision making. We have developed a web-based platform and integrated bioinformatic workflows that help to provide consistent high-quality analysis of SARS-CoV-2 sequencing data generated with either the Illumina or Oxford Nanopore Technologies (ONT). Using an intuitive web-based interface, this workflow automates data quality control, SARS-CoV-2 reference-based genome variant and consensus calling, lineage determination, and provides the ability to submit the consensus sequence and necessary metadata to GenBank, GISAID, and INSDC raw data repositories. We tested workflow usability using real world data and validated the accuracy of variant and lineage analysis using several test datasets, and further performed detailed comparisons with results from the COVID-19 Galaxy Project workflow. Our analyses indicate that EC-19 workflows generate high quality SARS-CoV-2 genomes. Finally, we share a perspective on patterns and impact observed with Illumina vs ONT technologies on workflow congruence and differences. AVAILABILITY: https://edge-covid19.edgebioinformatics.org, and https://github.com/LANL-Bioinformatics/EDGE/tree/SARS-CoV2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

A public website for the automated assessment and validation of SARS-CoV-2 diagnostic PCR assays.

Li, Po-E; Myers Y Gutiérrez, Adán; Davenport, Karen; Flynn, Mark; Hu, Bin; Lo, Chien-Chi; Player Jackson, Elais; Shakya, Migun; Xu, Yan; Gans, Jason D; Chain, Patrick S G.

Bioinformatics ; 37(7): 1024-1025, 2021 05 17.

Article in English | MEDLINE | ID: covidwho-706027

ABSTRACT

SUMMARY: Polymerase chain reaction-based assays are the current gold standard for detecting and diagnosing SARS-CoV-2. However, as SARS-CoV-2 mutates, we need to constantly assess whether existing PCR-based assays will continue to detect all known viral strains. To enable the continuous monitoring of SARS-CoV-2 assays, we have developed a web-based assay validation algorithm that checks existing PCR-based assays against the ever-expanding genome databases for SARS-CoV-2 using both thermodynamic and edit-distance metrics. The assay-screening results are displayed as a heatmap, showing the number of mismatches between each detection and each SARS-CoV-2 genome sequence. Using a mismatch threshold to define detection failure, assay performance is summarized with the true-positive rate (recall) to simplify assay comparisons. AVAILABILITY AND IMPLEMENTATION: The assay evaluation website and supporting software are Open Source and freely available at https://covid19.edgebioinformatics.org/#/assayValidation, https://github.com/jgans/thermonucleotide BLAST and https://github.com/LANL-Bioinformatics/assay_validation. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

COVID-19 , SARS-CoV-2 , COVID-19 Testing , Humans , Polymerase Chain Reaction , Sensitivity and Specificity

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL